Skip to content

Add core_excellent_cohort_revenue model#1

Open
zingleai wants to merge 1 commit intomainfrom
dp/6c6b606f
Open

Add core_excellent_cohort_revenue model#1
zingleai wants to merge 1 commit intomainfrom
dp/6c6b606f

Conversation

@zingleai
Copy link
Copy Markdown
Owner

@zingleai zingleai commented Feb 22, 2026

Summary

Creates a new gold-layer model core_excellent_cohort_revenue that shows total revenue from the Excellent rating cohort.

Changes

  • New model: core_excellent_cohort_revenue (gold layer, table materialization)
  • Filters dim_products by rating_tier = 'Excellent' and aggregates revenue, profit, units sold, and margin metrics

Lineage

graph LR
    dim_products --> core_excellent_cohort_revenue
Loading

Generated by Data Portal

Made with Cursor

Summary by CodeRabbit

  • New Features
    • Introduced a new data model that aggregates revenue and performance metrics for products with Excellent ratings at the cohort level. The model provides comprehensive insights including total revenue, gross profit, units sold, product count, average rating, and margin percentage, enabling advanced business intelligence and cohort-level analysis.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

A new dbt SQL model is introduced that filters products with Excellent rating tier and aggregates revenue, profit, units sold, and margin metrics at the cohort level.

Changes

Cohort / File(s) Summary
Excellent Cohort Revenue Model
dbt/models/marts/core/core_excellent_cohort_revenue.sql
New mart model that filters dim_products by Excellent rating tier and aggregates key metrics including product count, total revenue, gross profit, units sold, and average rating and margin percentage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Hopping through the data so bright,
Excellent cohorts now in sight,
Revenue and margin combined with care,
New metrics aggregated with flair!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding a new dbt SQL model named core_excellent_cohort_revenue, which matches the file changes shown in the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dp/6c6b606f

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
dbt/models/marts/core/core_excellent_cohort_revenue.sql (5)

36-36: Avoid select * — enumerate columns explicitly.

select * couples this model's output schema implicitly to final's column list. If the CTE changes (e.g., a column is added or reordered), downstream consumers see unannounced schema drift. Explicit column selection is idiomatic in dbt gold-layer models and makes contracts and documentation easier to enforce.

♻️ Proposed fix — explicit final select
-select * from final
+select
+    rating_tier,
+    product_count,
+    total_revenue,
+    total_gross_profit,
+    total_units_sold,
+    avg_rating,
+    avg_margin_pct
+from final
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` at line 36, The
model currently uses "select * from final" which exposes downstream schema
drift; replace the wildcard with an explicit, ordered list of columns drawn from
the final CTE (the output columns of the final CTE in
core_excellent_cohort_revenue.sql) so the model's contract is stable and
documented—open the final CTE definition to list each column name you want to
expose and update the final select to return that explicit column list in the
same order, ensuring any renamed or transformed fields use their final aliases.

36-36: Avoid select * in gold-layer models — enumerate columns explicitly.

select * couples the output schema implicitly to final's column list. Any upstream column addition or reorder becomes a silent schema change for downstream consumers. Explicit column projection is idiomatic in dbt mart models and makes contracts, documentation, and schema tests easier to maintain.

♻️ Proposed fix — explicit final select
-select * from final
+select
+    rating_tier,
+    product_count,
+    total_revenue,
+    total_gross_profit,
+    total_units_sold,
+    avg_rating,
+    avg_margin_pct
+from final
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` at line 36, The
model core_excellent_cohort_revenue.sql currently ends with "select * from
final" which leaks the upstream schema; replace that wildcard with an explicit,
ordered column projection listing every column produced by the CTE/table "final"
(e.g., id, cohort_date, revenue, cohort_size, etc.) so downstream contracts
remain stable—update the final SELECT in core_excellent_cohort_revenue.sql to
enumerate each column in the desired order, include any necessary aliased names,
and remove the "select *" usage.

6-15: product_name and product_category are selected in source but never referenced in final.

Both columns are dead weight in the intermediate CTE — they are projected but never consumed by the aggregation, adding unnecessary I/O.

♻️ Proposed fix — drop unused columns from `source`
     select
         product_id,
-        product_name,
-        product_category,
         avg_rating,
         rating_tier,
         total_units_sold,
         product_total_revenue,
         product_gross_profit,
         margin_pct
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 6 - 15,
The intermediate CTE named source is projecting product_name and
product_category but those fields are never used in the downstream final
aggregation; remove product_name and product_category from the SELECT list in
the source CTE (the projection that currently lists product_id, product_name,
product_category, avg_rating, rating_tier, total_units_sold,
product_total_revenue, product_gross_profit, margin_pct) so only the actually
consumed columns (e.g., product_id, avg_rating, rating_tier, total_units_sold,
product_total_revenue, product_gross_profit, margin_pct) are emitted, then run
the model/tests to confirm no references remain to
product_name/product_category.

6-15: product_name and product_category are selected but never used.

Both columns are pulled into source but neither appears in the final aggregation. They add unnecessary I/O and widen the intermediate result set with no benefit.

♻️ Proposed fix — drop unused columns from `source`
     select
         product_id,
-        product_name,
-        product_category,
         avg_rating,
         rating_tier,
         total_units_sold,
         product_total_revenue,
         product_gross_profit,
         margin_pct
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 6 - 15,
The source SELECT currently projects product_name and product_category but they
are never used in the final aggregation; remove product_name and
product_category from the source projection (the SELECT that builds "source") so
the intermediate row width is reduced, and ensure there are no remaining
references to product_name or product_category in downstream CTEs (e.g., final)
or GROUP/BY clauses; keep product_id and other aggregated columns (avg_rating,
rating_tier, total_units_sold, product_total_revenue, product_gross_profit,
margin_pct) unchanged.

29-30: Prefer CAST(... AS NUMERIC) for improved portability across warehouse adapters.

While ::numeric syntax works on the current Snowflake target, it's not supported in BigQuery or some other platforms. Since the project includes StarRocks configuration, using the standard CAST() function syntax (or dbt's {{ dbt.cast() }} macro) would make these lines compatible with broader adapter support and reduce friction if warehouse targets expand.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 29 -
30, Replace the Postgres-style cast operator used in the round expressions with
a portable CAST call: for the expressions producing avg_rating and
avg_margin_pct (the lines referencing avg(avg_rating)::numeric and
avg(margin_pct)::numeric), change the "::numeric" casts to the standard CAST(...
AS NUMERIC) (or use dbt.cast() if you prefer macro abstraction) so
round(avg(avg_rating)::numeric, 2) and round(avg(margin_pct)::numeric, 4) become
portable across adapters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql`:
- Line 30: Replace the simple arithmetic average used to compute avg_margin_pct
with a revenue-weighted margin: instead of using avg(margin_pct) for the
avg_margin_pct alias, compute sum(product_gross_profit) divided by
nullif(sum(product_total_revenue), 0) (and then round/cast as numeric to 4
decimals) so the metric weights by revenue and avoids division-by-zero; update
the expression that defines avg_margin_pct in this CTE accordingly (referencing
avg_margin_pct, margin_pct, product_gross_profit, and product_total_revenue).
- Around line 29-30: Replace the unweighted avg(margin_pct) with a
revenue-weighted margin computed as total_gross_profit / NULLIF(total_revenue,
0) so large-revenue items carry appropriate weight and avoid divide-by-zero;
update the column currently aliased as avg_margin_pct to use this expression
(with the same rounding/casting as other aggregates, e.g.,
round((total_gross_profit::numeric / NULLIF(total_revenue, 0))::numeric, 4) as
avg_margin_pct) and ensure total_gross_profit and total_revenue are available in
the same SELECT scope where avg_margin_pct is produced.
- Around line 1-3: Add a dbt config block to explicitly set materialization to
table so the model core_excellent_cohort_revenue uses table materialization
regardless of project defaults; insert a top-of-file line calling the Jinja
config helper ({{ config(materialized='table') }}) in
core_excellent_cohort_revenue.sql so the model behavior matches the PR intent
and downstream performance expectations.
- Around line 1-3: Add an explicit dbt config block to enforce table
materialization for the model in core_excellent_cohort_revenue.sql: insert a
top-of-file "{{ config(materialized='table') }}" config (before any SQL or
comments) so the model uses table materialization regardless of dbt_project.yml
defaults and matches the PR intent; ensure the config appears above the
SELECT/CTE logic in the existing model file.

---

Nitpick comments:
In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql`:
- Line 36: The model currently uses "select * from final" which exposes
downstream schema drift; replace the wildcard with an explicit, ordered list of
columns drawn from the final CTE (the output columns of the final CTE in
core_excellent_cohort_revenue.sql) so the model's contract is stable and
documented—open the final CTE definition to list each column name you want to
expose and update the final select to return that explicit column list in the
same order, ensuring any renamed or transformed fields use their final aliases.
- Line 36: The model core_excellent_cohort_revenue.sql currently ends with
"select * from final" which leaks the upstream schema; replace that wildcard
with an explicit, ordered column projection listing every column produced by the
CTE/table "final" (e.g., id, cohort_date, revenue, cohort_size, etc.) so
downstream contracts remain stable—update the final SELECT in
core_excellent_cohort_revenue.sql to enumerate each column in the desired order,
include any necessary aliased names, and remove the "select *" usage.
- Around line 6-15: The intermediate CTE named source is projecting product_name
and product_category but those fields are never used in the downstream final
aggregation; remove product_name and product_category from the SELECT list in
the source CTE (the projection that currently lists product_id, product_name,
product_category, avg_rating, rating_tier, total_units_sold,
product_total_revenue, product_gross_profit, margin_pct) so only the actually
consumed columns (e.g., product_id, avg_rating, rating_tier, total_units_sold,
product_total_revenue, product_gross_profit, margin_pct) are emitted, then run
the model/tests to confirm no references remain to
product_name/product_category.
- Around line 6-15: The source SELECT currently projects product_name and
product_category but they are never used in the final aggregation; remove
product_name and product_category from the source projection (the SELECT that
builds "source") so the intermediate row width is reduced, and ensure there are
no remaining references to product_name or product_category in downstream CTEs
(e.g., final) or GROUP/BY clauses; keep product_id and other aggregated columns
(avg_rating, rating_tier, total_units_sold, product_total_revenue,
product_gross_profit, margin_pct) unchanged.
- Around line 29-30: Replace the Postgres-style cast operator used in the round
expressions with a portable CAST call: for the expressions producing avg_rating
and avg_margin_pct (the lines referencing avg(avg_rating)::numeric and
avg(margin_pct)::numeric), change the "::numeric" casts to the standard CAST(...
AS NUMERIC) (or use dbt.cast() if you prefer macro abstraction) so
round(avg(avg_rating)::numeric, 2) and round(avg(margin_pct)::numeric, 4) become
portable across adapters.

Comment on lines +1 to +3
-- Total revenue summary for products in the Excellent rating cohort (avg_rating >= 4.5)
-- Filters dim_products to rating_tier = 'Excellent' and aggregates revenue,
-- profit, units sold, and margin metrics at the cohort level.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing {{ config(materialized='table') }} — intent not enforced.

The PR explicitly states this model should use table materialization, but no {{ config() }} block exists. Without it, dbt uses whatever default is set in dbt_project.yml for this path (marts/core). If that default is view or ephemeral, the model will silently behave differently from the stated intent, impacting downstream query performance and dependency expectations.

🛠️ Proposed fix — add config block
+{{ config(materialized='table') }}
+
 -- Total revenue summary for products in the Excellent rating cohort (avg_rating >= 4.5)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 1 - 3,
Add a dbt config block to explicitly set materialization to table so the model
core_excellent_cohort_revenue uses table materialization regardless of project
defaults; insert a top-of-file line calling the Jinja config helper ({{
config(materialized='table') }}) in core_excellent_cohort_revenue.sql so the
model behavior matches the PR intent and downstream performance expectations.

⚠️ Potential issue | 🟠 Major

Missing {{ config(materialized='table') }} — stated intent not enforced.

The PR explicitly targets table materialization, but no {{ config() }} block is present. Without it, dbt resolves materialization from dbt_project.yml defaults for this path. If that default is view or ephemeral, the model silently behaves differently from the stated intent, impacting downstream query performance and dependency contracts.

🛠️ Proposed fix — add config block
+{{ config(materialized='table') }}
+
 -- Total revenue summary for products in the Excellent rating cohort (avg_rating >= 4.5)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- Total revenue summary for products in the Excellent rating cohort (avg_rating >= 4.5)
-- Filters dim_products to rating_tier = 'Excellent' and aggregates revenue,
-- profit, units sold, and margin metrics at the cohort level.
{{ config(materialized='table') }}
-- Total revenue summary for products in the Excellent rating cohort (avg_rating >= 4.5)
-- Filters dim_products to rating_tier = 'Excellent' and aggregates revenue,
-- profit, units sold, and margin metrics at the cohort level.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 1 - 3,
Add an explicit dbt config block to enforce table materialization for the model
in core_excellent_cohort_revenue.sql: insert a top-of-file "{{
config(materialized='table') }}" config (before any SQL or comments) so the
model uses table materialization regardless of dbt_project.yml defaults and
matches the PR intent; ensure the config appears above the SELECT/CTE logic in
the existing model file.

Comment on lines +29 to +30
round(avg(avg_rating)::numeric, 2) as avg_rating,
round(avg(margin_pct)::numeric, 4) as avg_margin_pct
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

avg(margin_pct) is a simple arithmetic mean — consider a revenue-weighted margin instead.

A simple average of per-product margin percentages gives equal weight to a $10-revenue product and a $10M-revenue product. For a cohort revenue model, the standard approach is a weighted average derived directly from the already-aggregated sums:

total_gross_profit / NULLIF(total_revenue, 0)

This is more analytically sound and avoids a division-by-zero risk in the process.

♻️ Proposed fix — replace simple avg with revenue-weighted margin
         round(avg(avg_rating)::numeric, 2)  as avg_rating,
-        round(avg(margin_pct)::numeric, 4)  as avg_margin_pct
+        round(
+            (sum(product_gross_profit) / nullif(sum(product_total_revenue), 0))::numeric,
+            4
+        )                                   as avg_margin_pct
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` around lines 29 -
30, Replace the unweighted avg(margin_pct) with a revenue-weighted margin
computed as total_gross_profit / NULLIF(total_revenue, 0) so large-revenue items
carry appropriate weight and avoid divide-by-zero; update the column currently
aliased as avg_margin_pct to use this expression (with the same rounding/casting
as other aggregates, e.g., round((total_gross_profit::numeric /
NULLIF(total_revenue, 0))::numeric, 4) as avg_margin_pct) and ensure
total_gross_profit and total_revenue are available in the same SELECT scope
where avg_margin_pct is produced.

sum(product_gross_profit) as total_gross_profit,
sum(total_units_sold) as total_units_sold,
round(avg(avg_rating)::numeric, 2) as avg_rating,
round(avg(margin_pct)::numeric, 4) as avg_margin_pct
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

avg(margin_pct) is a simple arithmetic mean — use a revenue-weighted margin instead.

A simple average of per-product margin percentages assigns equal weight to every product regardless of its revenue contribution. For a cohort revenue model this produces a misleading metric. The revenue-weighted equivalent is already computable from the aggregates already in this CTE:

sum(product_gross_profit) / nullif(sum(product_total_revenue), 0)

This also avoids a potential division-by-zero if total_revenue is zero.

♻️ Proposed fix — replace simple avg with revenue-weighted margin
-        round(avg(margin_pct)::numeric, 4)  as avg_margin_pct
+        round(
+            (sum(product_gross_profit) / nullif(sum(product_total_revenue), 0))::numeric,
+            4
+        )                                   as avg_margin_pct
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/core_excellent_cohort_revenue.sql` at line 30, Replace
the simple arithmetic average used to compute avg_margin_pct with a
revenue-weighted margin: instead of using avg(margin_pct) for the avg_margin_pct
alias, compute sum(product_gross_profit) divided by
nullif(sum(product_total_revenue), 0) (and then round/cast as numeric to 4
decimals) so the metric weights by revenue and avoids division-by-zero; update
the expression that defines avg_margin_pct in this CTE accordingly (referencing
avg_margin_pct, margin_pct, product_gross_profit, and product_total_revenue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant